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ABSTRACT 


Thresholds for the parameter AD* which decides on the 
splitting of a cluster in the ISODATA program arc 
discussed. F : or the univariate case, 0.84 is estab- 
lished as a sound threshold, after testing on some 
typical distributions, simple as well as composite, 
and evaluating the probability of mi sc lass i f icat i on . 
Extension to the multivariate case leads to the 
empirical value of (N- 0 . 16) //FT, where N denotes 
the dimension of the vector space. A critical 
examination of the values of AD, especially for 
large N , results in the conclusion that AD is 
not a very effective measure for the present 
purpose . 




*AD is the acronym tor ‘'average distance". Its exact 
definition is found in equation (1) of Section 1. 
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INTRODUCTION 


I . 

The iterative unsupervised clustering ISODATA program 
proposed by Hall and Hall [1,2,3] has been very well received 
Application of such technique for the eventual goal of classi 
fication of multivariate statistical data has resulted in 
various levels of success by different organizations in 
diversified disciplines such as in remote sensing. One major 
difficulty in using ISODATA* is the lack of universally 
acknowledged thresholds for various parameters that appear 
in the program. Although experience can suggest to the 
programmer the appropriate values, it is certainly necessary 
to establish analytically some sound values for these thresh- 
olds. It is the intent of this paper to do exactly this in 
regard to the splitting of clusters. 

Summarizing the idea of ISODATA without going to its 
mathematical details, the following gives a verbal account 
of the program. The goal of the procedure is to separate 
all the statistical data into classes, or clusters, each 
having its average point (i.e., mean) as its representative 
point. At each intermediate step, new assignment of points 
is conducted by grouping or splitting the old clusters. The 
decisions on the assignment of points to different clusters, 
on the splitting and on the combining of clusters depend on 
some distance function and parameters derived from these 
distance measures. The program is started by arbitrary (or 
some preferred) initialization of cluster centers, and ends 
by noting the invariance of clusters with respect to addi- 
tional iterations. 

*The present discuss ion is primarily on I SODATA- POINTS 

[3]. 
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A measure of the tightness of packing of points 

{ Vk«l’ X k " (X kl' X k2'"* ' X kn )T C rH • in a cluster C 
is the 'average distance' AD defined by 


AD ^ i d(X R ,u) , (1) 

k- 1 
T 

where u = (y^ »U 2 * * * i s t ^ ie re P resenta c i ye point of 

C , i . e . , the mean 


n 



k-1 


( 2 ) 


and d ( • , • ) : >? N * -► /? 


is a distance function 


N 

d 2 (X , Y) ^ w.(X. - Y.) 2 , (3) 

i- 1 

w t C . A commonly* used set of weights (w^) is 

w. * , o i + 0 , (4) 

2 

where o.^ is assumed to he nonzero and denotes the variance 
of the cluster in the ith coordinate axis 

*Too many people in various disciplines have used th i s 
measure. Thus, no tracing to its original user is possible. 


( 5 ) 


£ I 


k * 1 


(X 


ki 


• V 


This {vO is a reasonable choice because of the standardi- 
zation or ' spherical i zat ion ' of the cluster C . Let T be 
the threshold. The rule to decide on the splitting of a 
cluster is: 

0. If All j> T , split the cluster. 

1. Otherwise, no splitting. 

The following further examines this parameter AD. The bound 
/FT is established, i.e., AD is proved to be The values 

of AD for univariate normal, triangular, trapezoidal, rec- 
tangular and 'bi-spiked' distributions are computed. 
Furthermore, different composite univariate distributions, 
each made up of two normal distributions with various vari- 
ances and a priori probabilities, are studied. The proba- 
bilities of misclassi f icat ion using the Bayes decision rule 
for the composite distributions are also computed in the 
case when a cluster i^ decided to be split into two, each 
assuming a normal distribution, and that points are to be 
reassigned. These considerations lead to the conclusion 
that the value 0.84 of AD is a sound choice for the threshold 
to split a cluster in ISODATA for the univariate case. While 
the value (N-0.16)//N of AD is extrapolated as an empirical 
threshold for the multivariate case, its effectiveness as a 
critical, discriminative parameter is considerably reduced 
when it is noticed that AD approaches its bound /FT fairly 
rapidly with N . In fact, for N large, the AD for any 
reasonably smooth distribution is shown to approach /S , 
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which makes it impossible to decide whether the distribution 
is simple or composite, i.e., which makes it impossible to 
decide whether to split the cluster or not. 
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2. SOME PROPERTIES OF AI) 


The intimacy between the 
AD has been suggested in the 
study this association, some 
here. 

2.1 Upper 


tightness of a cluster and its 
previous section. To further 
properties of AD are obtained 

Bound of AD 


Fact : With the definition of AD through (1)*(5), AD /FT . 

Proof : 


AD' 


n 


E d <V“> 


k- 1 


4 lE t, 2 (x K- lJ) • b r 


Lemma (see below) 


k- 1 


n N 


?E T, f? (X k i • • °i " ° 


k- 1 i- 1 i 



- N . 
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Lemma 


x. 2 


* k CR ■ 


Proof: (By mithematical induction) 


It suffice*: to prove the Lemma for the case when 

x k C S* . 


(a) N * 2 : 


L.H.S. 


X ♦ X 
1 2 


i { 2 *\ * 2 'i) ■ 


2 2 

Since x L ♦ x 2 > 2x 1 x 2 , 


(6) i 


7 [(>! * *i) * 2 x i x 2 ] 


■ 7 (X 1 * x 2 ) 


R.H S. 
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(b) Assume the Lemma is true for N ■ M , check it for M*1 


m* l 




x 2 ♦ x 2 
k M*1 


1 ..2 . „ 2 


* q y 4 


M+ 1 


( 7 ) 


k-1 


k-1 


where y = x^ , and the Lemma is applied for N ■ M 


k-l 


( 7 ) 


wr - \ y 2 ♦ hot 1 -— it y2 * x m+ 1 


1 

FHTT 


!/ 2 * 


[|R y2 * Mx m«i] 


,2 l 

M ♦ 1 1 


( 8 ) 


[s ■ 
M 


2yx 


m* l * 


(8) " ft 

That is, 


y2 * * x h.i 


M + 1 


k-1 


) * R-hr i* * x h*j> 


2 x k i FT 




Observations 


1. The equality in the Lemma holds when |x i | 
for all i f j . 


j 1 ’ 
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2. Thus, the upper hound vft is attained by AI) when 
d(X t ,y) ■ d ( X ^ , u ) for all i i j . 


2.2 AI) For Some Simple Univariate Distributions 

Assumption 1 : N is assumed to be 1 in this subsection 

through Section 4. F.xtcnsion to the case N t 1 is rele- 
gated to Section 5. 


Assumption 2 : Enough data points are assumed to be available 

such that the histograms of these data points will reproduce 
the assumed distribution. 


Not at i on s : ic. the following, p(»), u, o , AD will denote 

the probability density function, mean, standard deviation 
and the average distance (defined through (l)-(5)), 
respectively. 


(a) Ncrmal distribution N(0,o 2 ) 


p(x) - 


U - 0 

» 

0-0 


/77S 


2 *2 
-X /20* 


8 


AD - 




r = 


x 2 /2o 2 


2 

/nr 


- 0.80 . 


(9) 


(b) Triangular, trapezoidal, rectangular and 'bi-spiked' 
distributions. 

I 

Since the derivation of results for these distributions 
are straightforward as in (a), the results are only tabulated 
in Table 1. Refer to Iigure 1 for the shapes of these 
dist ributions . 


Obse rvat ion : It is obvious that the more widespread the 
distribution is, in the sense of its span relative to its 
standard deviation, the larger is AD. Thus, in the 'bi-spiked' 
case where the density has an extreme allocation, AD is unity, 
the upper bound given in Section 2.1. 


9 



3. AD FOR SOME COMPOSITE UNIVARIATE NORMAL DISTRIBUTIONS 

This section addresses to the decision of the splitting 
of a cluster by studying its AD. Assuming that samples from 
two distributions, whether they be close or far apart from 
each other, have been initially taken to belong to the same 
cluster, i.c. to a composite distribution, the question is: 
How will the value of AD provide the information as to 
whether or not it is advisable to split the cluster into two? 
The following will study a few combinations of two normal 
distributions Pj(*) and p 2 (*) with various variances and 


a priori 
P ( • ) * * 

probabilities n and 

lPl(*) ♦ * 2 P 2 (*) * See 

n 2 * i • e . , 
Figure 2. 

Case (a) 




"l " 

*2 " 

1/2 , Pj ~ 

N(0,1) , p 2 - N( - A , 1) 

p(x) « 

\ P x (x) 

| 

♦ 7 P 2 (x) * 

/ 

1 [ c -* 2 /2 , e -<x.A) 2 /2| . 

im L J 


It is simple to show that 


U ■ 0 

o 2 «= (4 ♦ A 2 )/4 

AD * o |/n e ” A2/8 * $ H • 2P N (- A/2) ]j (10) 
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where 


Pjj ( o) * f 1 P N (x) dx , P N ~ N ( 0 , 1 ) 


Values of these parameters for various A are tabulated in 
Table 2(a) together with the shape of the composite 
distributions . 


1/2 , r l ~ N (0 , 2) , p 2 - N ( - A , 1 ) : 


P(x) 


— — 

2 /Tv | /T 


” x 2 / 4 - ( x + A ) 2 /2 

c ♦ e 


> 


It can be similarly shown that 


where 


(6 ♦ A^)/4 


[7 An i * 7 AD 2 ]/° 


Ij • J |x - u|p 1 (x) dx 


u[2P n (u//7) ■ 1 ] ♦ e' u 
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and 


AD 2 - I |x * p|p 2 (x) dx 

• OD 

- (P ♦ A) [ 2P ( p ♦ A) - 1) ♦ -2- e " <U^A) 2 /2 ^ (12) 

N STn 

where P (•) is given by (11). Values of these parameters 
for various A arc again tabulated in Table 2(b). 

Case (c) 

Tfj « 2/3 , * 2 » 1/3 , p, ~ N(0 , 2) , 

P 2 - N(-A,l) 

Simi larly , 


U » -A/3 

) 

a 2 = (15 ♦ 2A 2 )/9 



(13) 


where AD 1 and AD 2 are given in (12). Table 2(c) shows the 
values of these parameters for different A . 
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Case (d) 


- 3/4 , v ^ - 1/4 , p x ~ N(0,2) , 

P 2 ~ N(-A,l) 

Simi 1 arly , 


u * -A/4 

o 2 - (28 ♦ 3A 2 )/16 

AIJ - [j AD X ♦ \ AD ^ja , (14) 

where ADj and AD., are given in (12). Table 2(d) shows the 

value of these parameters for different A . 

Observat ions 

1. Looking at the shape of the composite distribution versus 
the accompanying va'ue of AD, it can be concluded that 
when AD > 0.84 , the distributions discussed so far 
have two distinct "humps". This threshold value 0.84 of 
AD suggests, then, to be a sound choice to decide whether 
a cluster should be split or not. 

2. In case when a cluster is split, the examples above indi- 
cate that a good universal choice of the two new cluster 
centers is at to from the old cluster center, i.e. u . 
This is due to the fact that the points uto are quite 
close to the means of p^*) and p 2 (*)- Indeed, this 
conclusion conforms with common practice. 
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3. The following section discussing errors of misclassifi 
cation will further warrant this value 0.84 of AD. 
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4. PROBABILITY OF MISCLASSI FICATION 


This section investigates the following dilemma. In 
the case when a composite distribution is made up of two 
normal distributions with their means at a distance A apart, 
one might be faced with the dilemma that even if these two 
distributions arc identified (i.e., even if decision is made 
to split the cluster) the reassignment of points (from the 
old cluster) to their respective distributions (i.e., new 
clusters) might not be easy. Or, the assignment might involve 
too much error. If the probability of error of misclassifi- 
cation becomes too high, it might be more justified to con- 
sider that only one cluster exists (with a distorted normal 
distribution) than to consider the co-existence of two 
clusters (each coming from a normal distribution). 

The following will calculate the probability of misclassi- 
fication P that arises when the points are reassigned 

C« 

according to the Bayes decision rule maximizing a posteriori 
probability. This is a familiar technique to the communi- 
cation engineers (see ref. (4]), and is summarized as follows: 

Bayes Rule : Assign a point x to the distribution p 1 (*) if 

w p 00 > Tt 2 p 2 (x) . Otherwise, assign it to the 
distribution p 2 (*). 

Referring to Figure 2, the Bayes Rule says that any point x 
with x > 6 will be assigned to distribution p 1 (*)» and 
x < 0 assigned to distribution p 2 ( • ) , where 0 is such that 

i 

P'^-e) • p 2 (a - e) . (15) 
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Thus, the error of misclassif ication P is 


j: 


p x (x) 


dx 


C. 


p (x) dx 


(16) 


Straightforward calculation will show for the four cases 
studied in Section 3: 


Case (a) 

0 - A/2 

p e • W-e> • {17 > 

where P„(*) is given in (11). 

N 

Case (h) 

0 * 2A - V2A 2 ♦ 44 , ♦ ■ In /7 

/ / 

P E - — P N (-0//I) ♦ 7 (1 - P n (A - 0)] . (18) 

2/7 

Case (c) 

0 - 2A • V 2 A 2 - 4$ , ♦ - In /? 

P E - — P n (- 0//7) ♦ y (1 - P n (6 - 0)] . (19) 

3/7 
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Case (d) 


0 - 2A - V2A 2 - 44 / , * - In (3//7) 

P E - — p n (- 0//7) ♦ j [1 - p n (a - 0)] . (20) 

4 /7 

Valuer of 6 and P E for different A are tabulated in 

Table l(a)-( !) . 

Obscrvat ions 

1. First, it is noticed (5] that in the classification of, 
say, agricultural data, by remote sensing techniques, 

a 901 correct classification is considered to be good 
under the existing state-of-the-art. In other words, an 
error of up to 10^ is permissable. 

2. F.xamination of Table 2 reveals that if a composite dis- 
tribution with AD >. 0.84 is split into two normal 
distributions, the error of mi sc lass i f icat i on P E in 
the process of reassigning the points of the old cluster 
to the two new clusters will be less than 10t. This 
means that splitting of the cluster is justified. 
Otherwise, if AD < 0.84 , it is more advisable to 
retain. the old cluster and assume that it originates 
from a distorted normal distribution. 

3. The threshold 0.84 of AD is thus further justified. 

> 


17 



S. EXTENSION TO MULTIVARIATE CASE 


It is readily seen that extension of the results 
obtained in the previous sections to the multivariate case 
is nontrivial if not impossible. However, the evaluation 
of AD for the simplest multivariate normal distribution is 
still possible when mutual uncorrelation (and hence inde- 
pendence) is assumed of the components of the N-vector. The 
value of AD obtained for this distribution will certainly 
serve as a guideline for the values of AD for other simple 
or composite distributions. Furthermore, it is recalled 
that the bound on AD has been shown to be /ST . Thus, an 
empirical threshold for AD can be extrapolated. 

5.1 AD For Multivariate Normal Distribution 
Assume the following probability density: 




As shown in Section 2.2, 


AD - 



- 0.80 . 


( 21 ) 


( 22 ) 
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N - 2: 
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I 


N ■ 4 : Similarly, 

1/2 


I 

(25) 

Curve (a) in Figure 3 shows how AD/*^ varies with N for 
this normal distribution. 

5.2 Empirical Threshold 

Since 0.84 has been shown to be a sound threshold for 
the univariate case N ■ 1 , an empirical value of AD can 
be extrapolated for the multivariate case. The empirical 
formula 



Threshold - (N - 0.16)//K (26) 

appears to be a reasonable rule as plotted in Figure 3, 
curve (b) . . Typically then, the threshold for AD when N ■ 3 
is (3 - 0 . 16) //T - 1.64 . 
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5.3 Critical Observation 


From Figure 3, it is observed that the value of AD for 
a simple multivariate distribution (as given by (21)) 
approaches its limit /ft reasonably fast with N . In fact, 
examining the forms of (22) - (2S) , the factor r in the 
integral is readily recognized to be that factor which drives 
the value of AD for any reasonably smooth distribution (which 
possesses the first absolute moment) to its bound /T7 . This 
is by virtue of the fact that, for N large, the integral 

2 f r N e” r 2/2 dr (27) 

*0 

2 f °° 

equals 2 I r(r N ~ l e” r ^ 2 ) dr = I |x|p(x) dx , when p(s) 

■'o ■'-» 

will approximate the ’bi-spiked* distribution as shown in 
Figure 1(e), with A ■ 1 . Consequently, (27) and thus AD 
approaches its bound. 

CRITICISM : With the above observation, it is asserted that 

the parameter AD will lose its effectiveness as a discrimi- 
native parameter to decide on the splitting of a cluster when 
N is large. This is because the AD of any (reasonable) 
distribution approaches /TT and makes it impossible to 
detect the structure of the distribution from the value of 
AD. 


1 
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6. CONCLUSION 


The parameter Al) as used in the ISOPATA program is 
critically examined. Thresholds of AP to decide on the 
splitting of clusters arc obtained. For the univariaie 
ease, 0.84 is established as a sound choice, after examining 
several simple as well as composite distributions and also 
after investigating the probability of misclassi ficat ion when 
points have to be reassigned to the newly identified clusters. 
For the multivariate ease, the empirical threshold (N-0.16)//?T 
is extrapolated. A final criticism on AP is that AP would 
lose its effectiveness as a discriminative measure for the 
present purfosc when N is large. 
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TABLE CAPTIONS 


1. Tabulation of the probability density, mean, standard 
deviation and AD for the various distributions of 
Figure 1. 

2. Tabulation with respect to A of the mean, standard 
deviation, AD, 0 and P of the composite distribution 


p(’) 

■ IT 

iPl(*) 

* " 2 P 2 

(•) with 


(a) 

"l 

• „ 2 - 

1/2 , 

P A - N(fl ,1) 

, p 2 ~ N ( - A , 1 ) 

(b) 

"l 

“ *2 " 

1/2 , 

Pl ~ N (0 , 2 ) 

. p 2 - N(-A,l) 

(c) 

"l 

■ 2/3 

* *2 " 

1/3 . P x - 

N (0 , 2) , p 2 - N ( - A , 1 ) 

and (d) 

"l 

« 3/4 

* 71 2 * 

1/4 , p x ~ 

N (0 , 2) , p 2 ~ N ( - A , 1) . 


FIGURE CAPTIONS 

1. Some simple distributions: (a) Normal N(0,o 2 ), 

(b) Triangular with spread 2A, (c) Trapezoidal with spread 
4A, (d) Rectangular with spread 2A and (e) 'Bi-spiked' 
with equal weights spread 2A apart. 

2. A composite distribution p(*) made up of two normal dis- 
tributions p ^ ( • ) and p 2 ( • ) with a priori probabilities 
itj and n 2 . 

3. Variation of AD//FT with N for (a) Normal distribution, 

(b) The suggested threshold: AD = (N - 0.16)//Nf . 
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Normal I Triangular I Trapezoidal I Rectangular I 'Bi-Spiked' 
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o 

AD 

-0 

n 

Section 3 

Section 4 

-1 

1.58 

0.81 

-0.94 

0.162 

_1 7 

1.94 

0.84 

C 

vC 

r-H 

1 

0.087 

-2 

2.34 

0.88 

-2.22 

0.039 


I 

I 


Table 2(b) 

f 

t 

< 
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Section 3 


Section 4 


1.60 I 0.82 I -1.43 I 0.169 


-1 1.91 I 0.84 I -1.92 I 0.086 


-1 ^ 2.29 0.84 -2.46 0.042 


Table 2(c) 
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